An Analysis of the Application of Simplified Silhouette to the Evaluation of k-means Clustering Validity

نویسندگان

  • Fei Wang
  • Hector-Hugo Franco-Penya
  • John D. Kelleher
  • John Pugh
  • Robert J. Ross
چکیده

Silhouette is one of the most popular and effective internal measures for the evaluation of clustering validity. Simplified Silhouette is a computationally simplified version of Silhouette. However, to date Simplified Silhouette has not been systematically analysed in a specific clustering algorithm. This paper analyses the application of Simplified Silhouette to the evaluation of k-means clustering validity and compares it with the k-means Cost Function and the original Silhouette from both theoretical and empirical perspectives. The theoretical analysis shows that Simplified Silhouette has a mathematical relationship with both the k-means Cost Function and the original Silhouette, while empirically, we show that it has comparative performances with the original Silhouette, but is much faster in calculation. Based on our analysis, we conclude that for a given dataset the k-means Cost Function is still the most valid and efficient measure in the evaluation of the validity of k-means clustering with the same k value, but that Simplified Silhouette is more suitable than the original Silhouette in the selection of the best result from k-means clustering with different k values.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modification of the Fast Global K-means Using a Fuzzy Relation with Application in Microarray Data Analysis

Recognizing genes with distinctive expression levels can help in prevention, diagnosis and treatment of the diseases at the genomic level. In this paper, fast Global k-means (fast GKM) is developed for clustering the gene expression datasets. Fast GKM is a significant improvement of the k-means clustering method. It is an incremental clustering method which starts with one cluster. Iteratively ...

متن کامل

A hybrid DEA-based K-means and invasive weed optimization for facility location problem

In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...

متن کامل

Assessment of the Performance of Clustering Algorithms in the Extraction of Similar Trajectories

In recent years, the tremendous and increasing growth of spatial trajectory data and the necessity of processing and extraction of useful information and meaningful patterns have led to the fact that many researchers have been attracted to the field of spatio-temporal trajectory clustering. The process and analysis of these trajectories have resulted in the extraction of useful information whic...

متن کامل

A Clustering Based Location-allocation Problem Considering Transportation Costs and Statistical Properties (RESEARCH NOTE)

Cluster analysis is a useful technique in multivariate statistical analysis. Different types of hierarchical cluster analysis and K-means have been used for data analysis in previous studies. However, the K-means algorithm can be improved using some metaheuristics algorithms. In this study, we propose simulated annealing based algorithm for K-means in the clustering analysis which we refer it a...

متن کامل

Weighted Ensemble Clustering for Increasing the Accuracy of the Final Clustering

Clustering algorithms are highly dependent on different factors such as the number of clusters, the specific clustering algorithm, and the used distance measure. Inspired from ensemble classification, one approach to reduce the effect of these factors on the final clustering is ensemble clustering. Since weighting the base classifiers has been a successful idea in ensemble classification, in th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017